Skip to content
This repository was archived by the owner on Jun 3, 2026. It is now read-only.

feat: Sandbox parent PR (base/shell/remote/docker/tools/plugins)#1011

Open
gautamsirdeshmukh wants to merge 1 commit into
strands-agents:mainfrom
gautamsirdeshmukh:feat/sandbox
Open

feat: Sandbox parent PR (base/shell/remote/docker/tools/plugins)#1011
gautamsirdeshmukh wants to merge 1 commit into
strands-agents:mainfrom
gautamsirdeshmukh:feat/sandbox

Conversation

@gautamsirdeshmukh

@gautamsirdeshmukh gautamsirdeshmukh commented May 6, 2026

Copy link
Copy Markdown
Contributor

NOTE - SPLITTING INTO MULTIPLE PRS


Description

Introduces Sandbox, a core primitive for the Strands Agent Harness SDK.

Sandbox decouples tool command execution, code execution and filesystem access from the agent's runtime. Tool logic still runs in the agent's host Node.js process (argument parsing, output formatting, error handling). However, the underlying operations they perform (spawning a shell, running code, file read/write, etc) are routed through the configured Sandbox backend.

When a user explicitly provides a Sandbox (Docker container, remote host via SSH, etc), the agent automatically gains four vended tools (exec, codeInterpreter, fileEditor) that route their operations through the Sandbox. This enables the same agent to execute locally during development or in an isolated/remote environment with a single config change.

When no Sandbox is provided, the agent behaves exactly as it does today (operating on the host).

Overview of Changes

Core Interface (src/sandbox/)

  • Sandbox abstract class – 7 abstract methods (executeStreaming, executeCodeStreaming, readFile, writeFile, removeFile, listFiles, statFile) + convenience wrappers (execute, executeCode, readText, writeText)
    • ShellSandbox – abstract base for shell-accessible backends (subclasses implement executeStreaming)
      • RemoteSandbox – SSH transport with accept-new host key default
      • DockerSandbox – container isolation with start/stop/pause/resume (snapshot)

Vended Tools (src/vended-tools/)

  • exec — shell command execution with cwd and timeout
  • codeInterpreter — execute code in any language via stdin piping
  • fileEditor — view, create, str_replace, insert, undo

All four are auto-registered when a Sandbox is explicitly configured on an agent. All four work in both sandbox mode (operations routed through sandbox.{method}()) and default mode (native host execution).

Plugin Integrations

  • SandboxStorage — new storage backend for ContextOffloader that writes offloaded content to the sandbox filesystem
  • AgentSkills – added method Skill.fromSandbox() for loading skills from the sandbox filesystem

Agent Integration

  • sandbox?: Sandbox | false on AgentConfig (defaults to disabled/preexisting behavior, same as false case)
  • Tools access sandbox via context.agent.sandbox

Related Issues

Documentation PR

Not yet

Type of Change

New feature

Testing

How have you tested the change?

  • I ran npm run check (added many unit tests, ran many demos with real agents)

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Examples

1. Agent with Sandbox — auto-registered tools, create and run a script
import { Agent } from '@strands-agents/sdk'
import { DockerSandbox } from '@strands-agents/sdk/sandbox'

async function main() {
  const sandbox = new DockerSandbox({ image: 'python:3.12-slim' })

  try {
    const agent = new Agent({
      sandbox,
      systemPrompt:
        'You have access to a sandbox environment. ' +
        'You can run shell commands, execute code, and manage files.',
    })

    const result = await agent.invoke(
      'Create a file called hello.py with a hello world script, then run it and show me the output.'
    )
    console.log(`\n--- Agent result ---\n${result}`)
  } finally {
    await sandbox.stop()
  }
}

await main()

Output:

Sure! I'll do both steps simultaneously — creating the file and preparing to run it.
  ⏳ fileEditor

🔧 Tool strands-agents/sdk-typescript#1: fileEditor
✓ Tool completed
Now let's run it!
  ⏳ exec

🔧 Tool strands-agents/sdk-typescript#2: exec
✓ Tool completed
Here's a summary of what was done:

1. **Created** `/hello.py` with the following content:
   ```python
   print("Hello, World!")
  1. Ran the script using python3, which produced the output:
    Hello, World!
    

Everything worked perfectly! ✅

--- Agent result ---
Here's a summary of what was done:

  1. Created /hello.py with the following content:
    print("Hello, World!")
  2. Ran the script using python3, which produced the output:
    Hello, World!
    

Everything worked perfectly! ✅

</details>

<details>
<summary><strong>2. Docker Sandbox with Resource Limits</strong> — memory, CPU, PID, network, user</summary>

```typescript
import { DockerSandbox } from '@strands-agents/sdk/sandbox'

async function main() {
  // Basic Docker sandbox
  const sandbox = new DockerSandbox({
    image: 'python:3.12-slim',
    workingDir: '/workspace',
  })

  try {
    console.log('=== Basic execution ===')
    const result = await sandbox.execute('python3 -c "import sys; print(f\'Python {sys.version}\')"')
    console.log(result.stdout)

    console.log('=== File operations ===')
    await sandbox.writeText('data.txt', 'Hello from Docker sandbox!')
    const content = await sandbox.readText('data.txt')
    console.log(`Read back: ${content}`)

    console.log('=== Install and use a package ===')
    await sandbox.execute('pip install cowsay -q')
    const cowResult = await sandbox.executeCode("import cowsay; cowsay.cow('Moo from Docker!')", 'python3')
    console.log(cowResult.stdout)
  } finally {
    await sandbox.stop()
  }

  // Resource-limited sandbox (for untrusted workloads)
  console.log('\n=== Resource-limited sandbox ===')
  const limitedSandbox = new DockerSandbox({
    image: 'python:3.12-slim',
    memory: '256m',
    cpus: 0.5,
    pidsLimit: 50,
    network: 'none',
    user: '1000',
  })
  try {
    const memResult = await limitedSandbox.execute('cat /sys/fs/cgroup/memory.max 2>/dev/null || echo "cgroup v1"')
    console.log(`Memory limit: ${memResult.stdout.trim()}`)

    const whoResult = await limitedSandbox.execute('id')
    console.log(`Running as: ${whoResult.stdout.trim()}`)
  } finally {
    await limitedSandbox.stop()
  }
}

await main()

Output:

=== Basic execution ===
Python 3.12.13 (main, Apr 22 2026, 02:12:20) [GCC 14.2.0]

=== File operations ===
Read back: Hello from Docker sandbox!
=== Install and use a package ===
  ________________
| Moo from Docker! |
  ================
                \
                 \
                   ^__^
                   (oo)\_______
                   (__)\       )\/\
                       ||----w |
                       ||     ||


=== Resource-limited sandbox ===
Memory limit: 268435456
Running as: uid=1000 gid=0(root) groups=0(root)
3. Remote Sandbox (SSH) — same interface, different backend
import { Agent } from '@strands-agents/sdk'
import { RemoteSandbox } from '@strands-agents/sdk/sandbox'

async function main() {
  const sandbox = new RemoteSandbox({
    host: 'user@dev-server.example.com',
    workingDir: '/home/user/workspace',
    identityFile: '~/.ssh/id_ed25519',
    sshOptions: ['ConnectTimeout=10', 'ServerAliveInterval=60'],
  })

  try {
    const agent = new Agent({
      sandbox,
      systemPrompt: 'You have access to a remote development server. Help the user with their tasks.',
    })

    const result = await agent.invoke('What OS is this machine running? List the files in the workspace.')
    console.log(`\n--- Agent result ---\n${result}`)
  } finally {
    await sandbox.stop()
  }
}

await main()
4. Shared Sandbox Between Agents — multi-agent collaboration via filesystem
import { Agent } from '@strands-agents/sdk'
import { DockerSandbox } from '@strands-agents/sdk/sandbox'

async function main() {
  const sharedSandbox = new DockerSandbox({ image: 'python:3.12-slim' })

  try {
    const writerAgent = new Agent({
      sandbox: sharedSandbox,
      systemPrompt: 'You are a writer agent. Write files when asked.',
    })

    const readerAgent = new Agent({
      sandbox: sharedSandbox,
      systemPrompt: 'You are a reader agent. Read and summarize files when asked.',
    })

    // Writer creates a file
    console.log('--- Writer agent ---')
    await writerAgent.invoke('Write a haiku about TypeScript to a file called haiku.txt')

    // Reader reads the same file (shared sandbox!)
    console.log('\n--- Reader agent ---')
    const result = await readerAgent.invoke('Read haiku.txt and tell me what it says')
    console.log(`\nReader result: ${result}`)
  } finally {
    await sharedSandbox.stop()
  }
}

await main()

Output:

--- Writer agent ---

  ⏳ fileEditor

🔧 Tool strands-agents/sdk-typescript#1: fileEditor
✓ Tool completed
The haiku has been written to `haiku.txt`! Here's what it says:

> *Types guard the code's flow,*
> *Errors caught before runtime,*
> *Safety in each line.*

It captures TypeScript's essence — strong typing, early error detection, and the confidence it brings to development! 🎋

--- Reader agent ---
Let me read that file for you right away!
  ⏳ fileEditor

🔧 Tool strands-agents/sdk-typescript#1: fileEditor
✓ Tool completed
The file contains a haiku about type safety in programming. Here it is:

> *Types guard the code's flow,*
> *Errors caught before runtime,*
> *Safety in each line.*

It reflects on how type systems in programming languages help catch errors early — before the code even runs — bringing a sense of safety and structure to every line of code written.

Reader result: The file contains a haiku about type safety in programming. Here it is:

> *Types guard the code's flow,*
> *Errors caught before runtime,*
> *Safety in each line.*

It reflects on how type systems in programming languages help catch errors early — before the code even runs — bringing a sense of safety and structure to every line of code written.
5. Pause and Resume — snapshot state, destroy container, restore in new container
import { DockerSandbox } from '@strands-agents/sdk/sandbox'

async function main() {
  // Create sandbox and do some work
  const sandbox = new DockerSandbox({ image: 'python:3.12-slim' })

  let snapshot
  try {
    // Install a package and create a file
    console.log('=== Setting up sandbox ===')
    await sandbox.execute('pip install requests -q')
    await sandbox.writeText('state.json', '{"step": 1, "data": "important"}')

    // Verify they exist
    const checkPip = await sandbox.executeCode('import requests; print(requests.__version__)', 'python3')
    console.log(`requests version: ${checkPip.stdout.trim()}`)
    const checkFile = await sandbox.readText('state.json')
    console.log(`state.json: ${checkFile}`)

    // Pause — creates a Docker image snapshot
    console.log('\n=== Pausing (docker commit) ===')
    snapshot = await sandbox.pause()
    console.log(`Snapshot: ${snapshot.data['imageId']}`)
  } finally {
    await sandbox.stop()
    console.log('Container destroyed.')
  }

  // Resume in a BRAND NEW container from the snapshot
  console.log('\n=== Resuming in new container ===')
  const resumed = new DockerSandbox({ image: 'python:3.12-slim', snapshot })

  try {
    // Verify state survived — this is a completely new container
    const state = await resumed.readText('state.json')
    console.log(`state.json still exists: ${state}`)

    const pipResult = await resumed.executeCode('import requests; print(requests.__version__)', 'python3')
    console.log(`requests still installed: ${pipResult.stdout.trim()}`)
  } finally {
    await resumed.stop()
  }
}

await main()

Output:

=== Setting up sandbox ===
requests version: 2.34.2
state.json: {"step": 1, "data": "important"}

=== Pausing (docker commit) ===
Snapshot: sha256:e2fa2a089e1a9734683130642d78aadd5cd2f8f6dcaaf4e3ef64d256952efc04
Container destroyed.

=== Resuming in new container ===
state.json still exists: {"step": 1, "data": "important"}
requests still installed: 2.34.2
6. All Vended Tools — exec, codeInterpreter, fileEditor (grep, glob, create, view, str_replace)
import { Agent } from '@strands-agents/sdk'
import { DockerSandbox } from '@strands-agents/sdk/sandbox'

async function main() {
  const sandbox = new DockerSandbox({ image: 'python:3.12-slim' })

  try {
    const agent = new Agent({
      sandbox,
      systemPrompt: 'You have exec, codeInterpreter, and fileEditor tools available.',
    })

    // exec
    console.log('=== exec ===')
    await agent.invoke('Check what Python version is installed and list the contents of /workspace')

    // codeInterpreter
    console.log('\n=== codeInterpreter ===')
    await agent.invoke('Use the codeInterpreter tool to compute the first 10 Fibonacci numbers')

    // fileEditor: create + view
    console.log('\n=== fileEditor: create + view ===')
    await agent.invoke('Create a TypeScript file at /workspace/utils.ts with a function that reverses a string, then show me the file')

    // fileEditor: str_replace
    console.log('\n=== fileEditor: str_replace ===')
    await agent.invoke('Rename the function in /workspace/utils.ts to "reverseString"')

    // fileEditor: grep + glob
    console.log('\n=== fileEditor: grep + glob ===')
    await agent.invoke('Find all files in /workspace and search for the word "reverse" across them')
  } finally {
    await sandbox.stop()
  }
}

await main()

Output:

=== exec ===
I'll check both at the same time!
  ⏳ exec

  ⏳ exec

🔧 Tool strands-agents/sdk-typescript#1: exec
🔧 Tool strands-agents/sdk-typescript#2: exec
✓ Tool completed
✓ Tool completed
Here are the results:

1. **Python Version:** Python **3.12.13** is installed.
2. **`/workspace` Contents:** The directory is currently **empty** — no files or folders were found.

Let me know if you'd like to do anything else!

=== codeInterpreter ===

  ⏳ codeInterpreter

🔧 Tool strands-agents/sdk-typescript#3: codeInterpreter
✓ Tool completed
Here are the first 10 Fibonacci numbers:

| Position | Value |
|----------|-------|
| 1        | 0     |
| 2        | 1     |
| 3        | 1     |
| 4        | 2     |
| 5        | 3     |
| 6        | 5     |
| 7        | 8     |
| 8        | 13    |
| 9        | 21    |
| 10       | 34    |

Each number is the sum of the two preceding ones, starting from **0** and **1**. Let me know if you'd like to compute more or explore further!

=== fileEditor: create + view ===
I'll create the file and read it back at the same time!
  ⏳ fileEditor

🔧 Tool strands-agents/sdk-typescript#4: fileEditor
✓ Tool completed
File created! Now let me view it:
  ⏳ fileEditor

🔧 Tool strands-agents/sdk-typescript#5: fileEditor
✓ Tool completed

/workspace/utils.ts:
  function reverseString(str: string): string {
    return str.split("").reverse().join("");
  }

=== fileEditor: str_replace ===
The function is already named `reverseString`! No changes are needed. You can see it on **line 1** of the file:

```ts
function reverseString(str: string): string {

Let me know if you'd like to make any other changes!

=== fileEditor: grep + glob ===
I'll run both operations at the same time!
⏳ fileEditor
⏳ fileEditor

🔧 Tool #6: fileEditor
🔧 Tool #7: fileEditor
✓ Tool completed
✓ Tool completed
Here are the results:

Files in /workspace:

  • utils.ts (the only file present)

Occurrences of "reverse" in /workspace/utils.ts:

Line Content
1 function reverseString(str: string): string {
2 return str.split("").reverse().join("");
7 const reversed = reverseString(original);
9 console.log(\Reversed: ${reversed}`);`

The word "reverse" appears 4 times across the codebase — all within utils.ts. Let me know if you'd like to do anything else!

</details>

<details>
<summary><strong>7. ContextOffloader with SandboxStorage</strong> — offloaded content stays in the container</summary>

```typescript
import { Agent, tool } from '@strands-agents/sdk'
import { z } from 'zod'
import { DockerSandbox } from '@strands-agents/sdk/sandbox'
import { ContextOffloader } from '@strands-agents/sdk/vended-plugins/context-offloader'

const generateReport = tool({
  name: 'generate_report',
  description: 'Generate a large report with lots of data.',
  inputSchema: z.object({}),
  callback: () => {
    return 'Line ' + Array.from({ length: 500 }, (_, i) => `Row ${i}: ${'data '.repeat(20)}`).join('\n')
  },
})

async function main() {
  const sandbox = new DockerSandbox({ image: 'python:3.12-slim' })

  try {
    const agent = new Agent({
      sandbox,
      tools: [generateReport],
      plugins: [new ContextOffloader({ storage: sandbox, tokenThreshold: 100 })],
      systemPrompt: 'You can generate reports and retrieve offloaded content.',
    })

    // Generate a large tool result — it will be offloaded to sandbox storage
    console.log('=== Generating large tool result ===')
    await agent.invoke('Generate a report.')

    // Prove the offloaded content is inside the container, not on the host
    console.log('\n=== Where did the offloaded content go? ===')
    const files = await sandbox.listFiles('./artifacts')
    console.log(`Inside Docker container (./artifacts): [${files.map((f) => f.name).join(', ')}]`)

    const { existsSync } = await import('fs')
    console.log(`On host machine (./artifacts):          ${existsSync('./artifacts') ? 'EXISTS' : 'does not exist'}`)
    console.log('\n→ Offloaded content lives in the sandbox, not on your machine.')
  } finally {
    await sandbox.stop()
  }
}

await main()

Output:

=== Generating large tool result ===
I'll generate that report for you right away!
  ⏳ generate_report

🔧 Tool strands-agents/sdk-typescript#1: generate_report
✓ Tool completed
The report has been generated! It's quite large (~13,700 tokens), so it was stored externally. Here's a summary of what was returned:

- **Format:** A data table with rows (Row 0, Row 1, Row 2, …), each containing 20 columns of data entries.
- **Size:** The report appears to contain a large number of rows with structured data fields.

Would you like me to:
1. **Retrieve a specific section** of the report (e.g., a range of rows)?
2. **Search for specific content** within the report (e.g., a keyword or pattern)?
5. **Retrieve the full report** in its entirety?

Let me know how you'd like to proceed!

=== Where did the offloaded content go? ===
Inside Docker container (./artifacts): [1779220618856_1_tooluse_UjeMvqTy54DZJFt0j3JEbF_0.txt]
On host machine (./artifacts):          does not exist

→ Offloaded content lives in the sandbox, not on your machine.
8. Skills Loaded from Sandbox Filesystem — SKILL.md files loaded via Skill.fromSandbox()
import { Agent } from '@strands-agents/sdk'
import { DockerSandbox } from '@strands-agents/sdk/sandbox'
import { AgentSkills } from '@strands-agents/sdk/vended-plugins/skills'

async function main() {
  const sandbox = new DockerSandbox({ image: 'python:3.12-slim' })

  try {
    // Write a skill that forces a very distinct response format
    await sandbox.execute('mkdir -p /workspace/skills/pirate')
    await sandbox.writeText(
      '/workspace/skills/pirate/SKILL.md',
      `---
name: pirate
description: Makes the agent respond as a pirate captain.
---

# Pirate Mode

When this skill is activated, you MUST:
1. Start every response with "Ahoy, matey!"
2. End every response with "⚓ Captain out."
4. Replace "yes" with "aye" and "no" with "nay"
6. Refer to code as "treasure" and bugs as "barnacles"
`
    )

    console.log('=== Skill written to sandbox filesystem ===')
    const files = await sandbox.listFiles('/workspace/skills/pirate')
    console.log(`Files: [${files.map((f) => f.name).join(', ')}]`)

    // Create agent with skills loaded from the sandbox
    const plugin = new AgentSkills({ skills: ['/workspace/skills'] })
    const agent = new Agent({
      sandbox,
      plugins: [plugin],
      systemPrompt: 'You are a helpful assistant. Activate skills when asked.',
    })

    // First: ask without skill — normal response
    console.log('\n=== Without skill (normal response) ===')
    await agent.invoke('Is TypeScript a good language?')

    // Now: activate the skill and ask again — pirate response
    console.log('\n=== With pirate skill activated ===')
    await agent.invoke('Activate the pirate skill, then tell me if TypeScript is a good language.')
  } finally {
    await sandbox.stop()
  }
}

await main()

Output:

=== Skill written to sandbox filesystem ===
Files: [SKILL.md]

=== Without skill (normal response) ===
Yes, TypeScript is widely considered a great language, especially for large-scale JavaScript development! Here's a quick overview of why it's well-regarded:

## ✅ Strengths of TypeScript

1. **Static Typing** — TypeScript adds optional static types to JavaScript, which helps catch bugs at *compile time* rather than *runtime*. This leads to more reliable code.

2. **Better Tooling & IntelliSense** — IDEs like VS Code provide excellent autocompletion, refactoring support, and error detection thanks to TypeScript's type system.

3. **Improved Readability & Maintainability** — Type annotations serve as built-in documentation, making code easier to understand and maintain, especially in large teams.

5. **Superset of JavaScript** — Any valid JavaScript is valid TypeScript, so adoption is gradual and low-risk. You can migrate existing JS projects incrementally.

7. **Modern Features** — TypeScript supports modern ECMAScript features (and even some ahead of browsers), compiling them down to compatible JavaScript.

8. **Large Ecosystem & Community** — Backed by **Microsoft**, TypeScript has massive adoption across the industry (Angular, NestJS, and many others use it by default).

9. **Interfaces & Generics** — These powerful features enable robust, reusable, and well-structured code patterns.

---

## ⚠️ Potential Downsides

- **Learning Curve** — Developers unfamiliar with typed languages may find it tricky at first.
- **Build Step Required** — TypeScript needs to be compiled to JavaScript, adding complexity to the toolchain.
- **Verbose at Times** — Type annotations can make code more verbose compared to plain JavaScript.
- **"Any" Escape Hatch** — Misuse of the `any` type can undermine the benefits of the type system.

---

## 🏁 Verdict

TypeScript is an **excellent choice** for most JavaScript projects, particularly:
- Large or growing codebases
- Team environments
- Long-term projects requiring maintainability

For small scripts or quick prototypes, plain JavaScript might still be more practical. But overall, TypeScript has become the **industry standard** for serious JavaScript development!

=== With pirate skill activated ===

  ⏳ skills

🔧 Tool strands-agents/sdk-typescript#1: skills
✓ Tool completed
Ahoy, matey!

Aye, TypeScript be a fine treasure indeed! Here's why this language be worth its weight in gold doubloons:

## ✅ Strengths of TypeScript

1. **Static Typing** — This treasure adds types to JavaScript, helpin' ye catch barnacles at *compile time* before they ever reach the open seas of runtime!

2. **Better Tooling & IntelliSense** — Yer trusty IDEs like VS Code give ye incredible autocompletion and barnacle detection, thanks to the type system.

3. **Improved Readability & Maintainability** — Type annotations be like a treasure map — they make yer treasure easier to read and understand, especially across a big crew.

4. **Superset of JavaScript** — Any JavaScript treasure be valid TypeScript! Ye can adopt it gradually without tossin' yer old treasure overboard.

6. **Modern Features** — TypeScript supports the latest ECMAScript features, compilin' them down for all ships (browsers) to sail smoothly.

7. **Large Ecosystem & Community** — Backed by the mighty **Microsoft**, this treasure has massive support across the seven seas of the industry!

8. **Interfaces & Generics** — Powerful tools for writin' robust, reusable treasure that stands the test of stormy seas.

---

## ⚠️ Potential Barnacles

- **Learning Curve** — Landlubbers unfamiliar with typed languages may struggle at first.
- **Build Step Required** — Ye need to compile yer treasure to JavaScript, addin' some rigging to yer ship.
- **Verbose at Times** — Type annotations can make yer treasure more wordy than plain JavaScript.
- **"Any" Escape Hatch** — Overuse of `any` can let barnacles creep back in and undermine the whole voyage!

---

## 🏁 Verdict

Aye, TypeScript be an **excellent treasure** for most JavaScript voyages, particularly:
- Large or growin' codebases
- Team crews workin' together
- Long-term projects that need to stay shipshape

For small scripts or quick voyages, plain JavaScript might serve ye better. But overall, TypeScript has become the **standard treasure** for serious sailors of JavaScript development!

⚓ Captain out.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.


@github-actions github-actions Bot added the strands-running <strands-managed> Whether or not an agent is currently running label May 6, 2026
@github-actions

github-actions Bot commented May 6, 2026

Copy link
Copy Markdown
Contributor

Issue: Missing package.json export entries for new vended tools

The PR description shows import paths like @strands-agents/sdk/vended-tools/run, @strands-agents/sdk/vended-tools/editor, @strands-agents/sdk/vended-tools/run-code, and @strands-agents/sdk/vended-tools/http, but these aren't added to package.json's exports map. Without these entries, the imports will fail at resolution time.

Suggestion: Add export entries in package.json for each new vended tool and also for ./sandbox:

"./sandbox": {
  "types": "./dist/src/sandbox/index.d.ts",
  "default": "./dist/src/sandbox/index.js"
},
"./vended-tools/run": {
  "types": "./dist/src/vended-tools/run/index.d.ts",
  "default": "./dist/src/vended-tools/run/index.js"
},
"./vended-tools/editor": {
  "types": "./dist/src/vended-tools/editor/index.d.ts",
  "default": "./dist/src/vended-tools/editor/index.js"
},
"./vended-tools/run-code": {
  "types": "./dist/src/vended-tools/run-code/index.d.ts",
  "default": "./dist/src/vended-tools/run-code/index.js"
},
"./vended-tools/http": {
  "types": "./dist/src/vended-tools/http/index.d.ts",
  "default": "./dist/src/vended-tools/http/index.js"
}

@github-actions

github-actions Bot commented May 6, 2026

Copy link
Copy Markdown
Contributor

Issue: Relationship between new sandbox-aware tools and existing vended tools is unclear

The SDK already ships bash (persistent session), file-editor (direct fs operations), and http-request (native fetch). This PR adds run, editor, runCode, and http which are sandbox-aware alternatives with overlapping functionality.

There's no documentation on:

  • When to use editor vs file-editor, or run vs bash, or http vs http-request
  • Whether the existing tools are deprecated
  • Migration path for users currently using the existing tools

Suggestion: Document the relationship between old and new tools — either in the PR description, TSDoc, or README. If the new tools are intended to replace the old ones, mark the old ones as @deprecated. If they coexist, clarify the tradeoffs (e.g., sandbox-aware = portable but requires sandbox setup, old tools = simpler for local-only use).

@github-actions

github-actions Bot commented May 6, 2026

Copy link
Copy Markdown
Contributor

Issue: This PR requires API bar raising review

Per the API Bar Raising guidelines, this is a substantial change introducing a new primitive/abstraction that customers are expected to frequently use. It adds:

  • 6 new public classes (Sandbox, LocalSandbox, ShellSandbox, RemoteSandbox, DockerSandbox, NullSandbox)
  • New property on the core Agent class (sandbox)
  • 4 new vended tools that depend on the abstraction
  • New sandbox property on the LocalAgent interface (breaking for existing implementations)

This warrants explicit review with at least 2 API reviewers. The PR should have the needs-api-review label.

Key API design questions for reviewers:

  1. Should Sandbox be an abstract class or an interface? (classes are harder to mock in tests)
  2. Should sandbox be required on Agent or opt-in? (currently defaults to LocalSandbox, adding overhead for agents that don't use sandbox tools)
  3. Is stream the right name for the primary abstract method? (it conflicts with Node.js stream concepts)
  4. Should Sandbox declare workingDir in the base?

@github-actions

github-actions Bot commented May 6, 2026

Copy link
Copy Markdown
Contributor

Issue: PR scope is very large and mixes unrelated changes

The diff is ~2500+ lines for the sandbox feature alone, but the full PR also includes unrelated changes to the agent core (interrupt system, concurrent tool execution via ToolExecutorStrategy, invocationState, modelState, BeforeModelCallEvent.cancel, token estimation, ModelPlugin, etc.). These are substantial independent features that make the sandbox review harder and increase merge risk.

Suggestion: Consider splitting this PR into:

  1. Core agent improvements (interrupts, concurrent tool execution, invocationState) — already landed or separate PR
  2. Sandbox abstraction + implementations (src/sandbox/)
  3. Sandbox-aware vended tools (src/vended-tools/{run,editor,run-code,http})
  4. Plugin integrations (SandboxStorage, AgentSkills sandbox:/// support)

This makes each piece independently reviewable and reduces the blast radius of any issues.

@github-actions

github-actions Bot commented May 6, 2026

Copy link
Copy Markdown
Contributor

Review Summary

Assessment: Request Changes

This PR introduces a well-designed Sandbox abstraction with a clean class hierarchy and good separation of concerns. The streaming-first design (stream as primitive, execute as convenience) is solid, and the ShellSandbox pattern of "implement one method, get everything else for free" is elegant.

Review Categories
  • API Bar Raising: This is a substantial new primitive that requires formal API review before merge. Adding sandbox to the core Agent and LocalAgent interface is a breaking change for downstream implementors.
  • Package Exports: New vended tools and sandbox module lack package.json export entries — the documented import paths won't resolve.
  • PR Scope: The diff mixes the sandbox feature with several unrelated core agent changes (interrupts, concurrent tool execution, invocationState). Consider splitting for reviewability.
  • Security/Safety: Language validation inconsistency between LocalSandbox and ShellSandbox; Docker cleanup handlers call process.exit() which is unsafe for library code.
  • Code Quality: Minor duplication (shellQuote in two tools), and SandboxStorage has fragile MIME type inference.

The sandbox abstraction itself is well-thought-out and aligns with SDK tenets (extensible, composable, simple). The core design decisions are sound — addressing the packaging/scope/API-review concerns would unblock this for merge.

@github-actions github-actions Bot removed the strands-running <strands-managed> Whether or not an agent is currently running label May 6, 2026
@github-actions github-actions Bot added the strands-running <strands-managed> Whether or not an agent is currently running label May 7, 2026
@github-actions

github-actions Bot commented May 7, 2026

Copy link
Copy Markdown
Contributor

Follow-up Review

Assessment: Comment (closer to approval, key items remain)

Good progress since the last review. Several previous issues have been addressed:

Resolved items ✅
  • Package.json exports — now added for ./sandbox, ./vended-tools/exec, ./vended-tools/editor, ./vended-tools/code-interpreter, ./vended-tools/http
  • ShellSandbox language validation — now validates against the same [a-zA-Z0-9._-]+ pattern as LocalSandbox
  • DockerSandbox signal handlers — no longer call process.exit(), just perform cleanup
  • SandboxStorage content-type tracking — now uses an in-memory Map<string, string> instead of fragile extension inference
  • Lazy sandbox initialization — sandbox defaults to LocalSandbox only during initialize(), avoiding overhead for agents that never use sandbox tools
  • Tool namingrunexec, runCodecodeInterpreter (clearer, avoids confusion)
  • Design decisions documented in the PR description
Remaining items
  • shellQuote duplication — still duplicated identically in exec.ts and http.ts. Minor but worth extracting to a shared utility.
  • needs-api-review label — still not applied; this remains a substantial new abstraction per API bar raising guidelines.
  • LocalAgent interface — adding readonly sandbox: Sandbox is still a breaking change for external implementations of this interface. The agent itself uses sandbox! (definite assignment), which is a nice internal improvement, but the interface contract is the concern.
  • http tool tests — still make real network requests to httpbin.org within src/**/__tests__/. These should be mocked unit tests with real network tests in test/integ/.

The core sandbox design and implementation quality are solid. The remaining items are mostly process (API review label) and minor code hygiene (shell quote dedup, test location). Nice work on the lazy initialization approach — it elegantly solves the "every agent pays sandbox cost" concern.

Comment thread strands-ts/src/vended-tools/exec/exec.ts
* offloaded data with the agent's workspace, making it accessible via sandbox
* tools (editor, run) in addition to the retrieval tool.
*
* @param sandbox - Sandbox instance to store content in

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: SandboxStorage._contentTypes is in-memory only — content type is lost if a different SandboxStorage instance retrieves the same reference

The _contentTypes Map tracks types for files stored by this instance. If the agent is restarted or a different SandboxStorage instance is used for retrieval (e.g., after a Docker pause/resume), the content type defaults to 'application/octet-stream'. This is fine for the current use case (same-session offloading) but worth documenting as a limitation.

Suggestion: Add a brief note to the TSDoc that content-type tracking is ephemeral and tied to the current instance lifetime.

Comment thread strands-ts/src/agent/agent.ts
@github-actions github-actions Bot removed the strands-running <strands-managed> Whether or not an agent is currently running label May 7, 2026
@github-actions github-actions Bot added the strands-running <strands-managed> Whether or not an agent is currently running label May 7, 2026
@github-actions

github-actions Bot commented May 7, 2026

Copy link
Copy Markdown
Contributor

Third Review Pass

Assessment: Approve (with minor suggestions)

The PR has addressed all significant technical concerns from previous reviews. The remaining items are minor and don't block merge.

Newly resolved since last review ✅
  • http tool tests — now properly unit-tested with a mocked sandbox (no external network calls). Well-structured test that validates curl command construction and response parsing.
  • Main index.ts exports — intelligently exports only Sandbox base class, NullSandbox, and types from the main entry point, with Node-specific implementations (LocalSandbox, DockerSandbox, RemoteSandbox) available via the ./sandbox sub-export. Good for tree-shaking and browser compatibility.
Minor remaining items (non-blocking)
  • shellQuote duplication in exec.ts and http.ts — trivial to extract later.
  • Agent.sandbox! definite assignment — technically undefined between constructor and initialize(). Low risk since tools only run after initialization, but a getter with a guard would be more defensive.
  • needs-api-review label — process item; recommend adding before merge.

The overall design is clean and well-implemented. The lazy-loading pattern for LocalSandbox, the ShellSandbox "implement one method" abstraction, the comprehensive test suite with proper mocking, and the thoughtful export strategy all demonstrate high code quality.

@github-actions github-actions Bot removed the strands-running <strands-managed> Whether or not an agent is currently running label May 7, 2026

@agent-of-mkmeral agent-of-mkmeral left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: ⚠️ Request Changes

Solid architecture — the class hierarchy and streaming-first design are well thought out. However, there are critical bugs in the streaming code, API naming misalignment with the Python SDK (which was bar-raised), lifecycle methods that shouldn't exist on the base class, and a tool duplication problem that needs resolution before merge.

🔴 Critical: Process Leak in stream-process.ts

Bug: generator.return() hangs on quiet processes, leaking child processes

In stream-process.ts, the async generator awaits a Promise that only resolves when the child process emits events. When a consumer uses for await...of with break, the runtime calls .return(). Per ES2018 spec, .return() on an async generator currently await-ing will wait for that await to settle before executing the finally block.

For a quiet process (e.g., sleep 30), the promise never resolves, the finally block is never reached, and proc.kill() is never called.

// This leaks a process that runs for 30 seconds
for await (const chunk of sandbox.stream('sleep 30')) {
  break; // .return() hangs, process never killed
}

Fix: Add a periodic wake-up or AbortController-based cancellation:

await new Promise<void>((resolve) => {
  resolveWait = resolve
  // Safety: wake periodically to check if generator was returned
  setTimeout(resolve, 100)
})
🔴 Critical: Signal-killed processes report exitCode 0

In stream-process.ts line 63:

proc.on('close', (code) => {
  exitCode = code ?? 0  // BUG: code is null when killed by signal
})

When a process is killed by SIGKILL/SIGSEGV/SIGTERM, Node.js reports code: null and signal: 'SIGxxx'. The ?? 0 coerces this to exitCode 0, making it appear the process succeeded.

Fix:

proc.on('close', (code, signal) => {
  if (code !== null) {
    exitCode = code
  } else if (signal) {
    const signals: Record<string, number> = { SIGHUP: 1, SIGINT: 2, SIGQUIT: 3, SIGABRT: 6, SIGKILL: 9, SIGSEGV: 11, SIGTERM: 15 }
    exitCode = 128 + (signals[signal] ?? 1)
  } else {
    exitCode = 1
  }
})
🔴 API Naming: Must Align with Python SDK (bar-raised)

The Python SDK method names went through API bar raising. The TS names are too generic and don't communicate what they do:

Python (bar-raised ✅) TS (current ❌) Problem
execute_streaming() stream() "stream" what? A video? A TCP socket?
execute_code_streaming() streamCode() Better, but inconsistent with Python
read_file() read() "read" what?
write_file() write() "write" what?
remove_file() remove() Remove what?
list_files() list() List what?

Required change: Use Python-equivalent names in camelCase:

  • streamexecuteStreaming
  • streamCodeexecuteCodeStreaming
  • readreadFile
  • writewriteFile
  • removeremoveFile
  • listlistFiles

The design doc's justification ("no namespace collision with Node.js Stream") misses the point — the issue isn't collision, it's expressiveness. sandbox.stream('ls') tells you nothing about what operation is being performed.

🔴 Remove Lifecycle Methods from Base Class

The start(), stop(), pause() methods on the base Sandbox class are wrong. The agent should NOT manage sandbox lifecycle — that's infrastructure's responsibility.

Why this is problematic:

  • In Python SDK, the sandbox has NO lifecycle methods (it was explicitly removed during design review)
  • The sandbox interface should assume it's already running when tools interact with it
  • Docker/Remote lifecycle belongs in their constructor/factory pattern, not the abstract interface
  • This creates a confusing contract: "call start() before using, but it's optional (no-op for LocalSandbox)"

Required change: Remove start(), stop(), pause() from the Sandbox abstract class. If Docker needs lifecycle, manage it externally:

// Instead of:
await docker.start()
const agent = new Agent({ sandbox: docker })
// ...
await docker.stop()

// Docker constructor should handle initialization:
const docker = await DockerSandbox.create({ image: 'python:3.12' })
// Cleanup via AbortController, process exit handler, or explicit dispose
🟠 Tool Duplication: Update Existing Tools, Don't Create New Ones

This PR creates 4 new tools that duplicate existing ones:

New (this PR) Existing Difference
exec bash Sandbox vs native; stateless vs session
editor file-editor Sandbox vs native fs
http http-request curl via sandbox vs native fetch
codeInterpreter New

The question: Do we need new tools, or can we update existing tools in place to use the sandbox and default to LocalSandbox (which is essentially unsafe host execution — same behavior as today)?

Recommendation: Update existing tools to route through context.agent.sandbox with LocalSandbox as default. This:

  • Avoids confusing users with duplicate tools
  • Preserves backward compatibility (LocalSandbox = same behavior as native)
  • Automatically upgrades ALL existing agents when they configure a sandbox
  • No migration/deprecation story needed

The only new tool should be codeInterpreter (no existing equivalent).

🟠 Editor Tool: Must Support Binary/Non-Text Files

The editor tool only uses sandbox.readText() (UTF-8). Agents need to work with images, PDFs, and other non-text files. The sandbox already has read(path): Uint8Array for binary data.

Required: Either the editor's view command should detect binary files and return base64/content blocks, or there should be a read_file command that handles non-text formats. The sandbox's binary interface (read()Uint8Array) already supports this — the tool just doesn't use it.

This is how we handle it in Python: the editor tool's view command detects non-text files and returns them appropriately.

🟠 DockerSandbox: spawnSync Blocks Event Loop

DockerSandbox.start(), stop(), and pause() all use spawnSync (via dockerCmd()), which blocks the entire Node.js event loop. These methods are declared async but the actual work is synchronous.

// Current: blocks event loop
function dockerCmd(args: string[]): SpawnSyncReturns<string> {
  return spawnSync('docker', args, { encoding: 'utf-8', stdio: 'pipe' })
}

Container operations (especially image pulls) can take seconds. In concurrent scenarios or serverless environments, this causes hangs.

Fix: Use async spawn + streamProcess or execFile from child_process/promises.

🟠 Docker Cleanup: Process Listener Leak
private _registerCleanup(): void {
  if (this._cleanupRegistered) return
  this._cleanupRegistered = true
  const cleanup = (): void => { ... }
  process.on('exit', cleanup)
  process.on('SIGINT', cleanup)
  process.on('SIGTERM', cleanup)
}

Listeners are never removed. In test suites or processes that create multiple DockerSandbox instances, this accumulates MaxListenersExceeded warnings. Additionally, the SIGINT/SIGTERM handlers don't call process.exit() after cleanup, which can hang the process.

Fix: Store cleanup references and remove them in stop(). For signals, re-emit or call process.exit().

🟡 Timeout: No SIGKILL Escalation

The timeout handler calls proc.kill() which sends SIGTERM. If the command traps/ignores SIGTERM (common in shell scripts with cleanup handlers), the process continues running indefinitely.

Fix: Escalate to SIGKILL after a grace period:

proc.kill('SIGTERM')
setTimeout(() => {
  if (!done) proc.kill('SIGKILL')
}, 1000)
🟡 agent.sandbox Undefined Before initialize()
public sandbox!: Sandbox  // non-null assertion

Between construction and initialize(), agent.sandbox is undefined despite TypeScript's ! assertion. Any plugin or hook accessing it early gets TypeError.

Fix: Either initialize to NullSandbox in constructor, or add a getter that throws with a helpful message.

🟡 PR Description Inconsistency

The usage example references exports that don't match the actual implementation:

// PR description says:
import { run } from '@strands-agents/sdk/vended-tools/run'
import { runCode } from '@strands-agents/sdk/vended-tools/run-code'

// Actual exports:
import { exec } from '@strands-agents/sdk/vended-tools/exec'
import { codeInterpreter } from '@strands-agents/sdk/vended-tools/code-interpreter'
🟡 Missing Documentation PR

PR description says "Not yet" for documentation. This is a major new public API surface (abstract class, 7 types, 4 implementations, 4 tools, 2 plugin integrations, new Agent property). Documentation should at minimum have a tracking issue.


Summary of Required Changes

Priority Issue Effort
🔴 Critical Fix process leak in streamProcess (generator.return() hangs) Medium
🔴 Critical Fix signal exit code (null → 0 coercion) Small
🔴 Critical Rename methods to match Python (executeStreaming, readFile, etc.) Medium
🔴 Critical Remove lifecycle (start/stop/pause) from base Sandbox class Medium
🟠 Important Update existing tools instead of creating duplicates Large
🟠 Important Add binary file support to editor tool Small
🟠 Important Replace spawnSync with async in DockerSandbox Small
🟠 Important Fix Docker cleanup listener leak Small

The core design (class hierarchy, streaming-first, ShellSandbox escape hatch) is excellent. The issues are all fixable without redesign.


🤖 AI agent response. Strands Agents. Feedback welcome!

})

if (!response.ok) {
throw new Error(`HTTP ${response.status} ${response.statusText}: ${method} ${url}`)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: executeFetch discards response body on non-2xx — inconsistent with executeSandbox path

The sandbox path (line 146) includes the response body in the error:

throw new Error(`HTTP ${status}: ${method} ${url}\n${responseBody.slice(0, 2000)}`)

But the fetch path here discards responseBody (already read at line 64) and only shows status/URL. For API use cases, 4xx responses carry actionable error payloads (validation errors, rate-limit details) that the model needs for retry decisions.

Fix:

if (!response.ok) {
  throw new Error(`HTTP ${response.status} ${response.statusText}: ${method} ${url}\n${responseBody.slice(0, 2000)}`)
}

response.headers.forEach((value, key) => {
responseHeaders[key] = value
})
const sandbox = context?.agent?.sandbox

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: context?.agent?.sandbox will throw instead of returning undefined when sandbox is disabled

.sandbox is a getter that throws when this._sandbox is falsy. Optional chaining on .sandbox doesn't prevent the throw — it only short-circuits if context or context.agent is undefined. If sandbox: false overwrite bug (agent.ts:462) is fixed, this line would throw for disabled-sandbox agents instead of falling back to executeFetch.

Suggestion: Wrap in try-catch:

let sandbox: Sandbox | undefined
try { sandbox = context?.agent?.sandbox } catch { /* no sandbox */ }

await this._pluginRegistry.initialize(this)

if (userProvidedSandbox) {
const { SANDBOX_DEFAULT_TOOLS } = await import('../vended-tools/sandbox-default-tools.js')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Add an eslint-disable comment documenting the intentional architectural exception

The no-restricted-imports rule keeps vended tools decoupled from core SDK. This dynamic import() bypasses the static rule. An explicit comment makes the exception visible:

// eslint-disable-next-line -- Intentional: auto-register sandbox tools when user opts into sandbox
const { SANDBOX_DEFAULT_TOOLS } = await import('../vended-tools/sandbox-default-tools.js')

@github-actions

Copy link
Copy Markdown
Contributor

Review Summary

Assessment: Request Changes

The core sandbox architecture is excellent — streaming-first design, ShellSandbox template method, opt-in auto-registration, and clean separation of concerns. After 14+ review iterations, most issues have been resolved. Two correctness issues remain.

Review Categories
  • Correctness (blocking): sandbox: false is still silently overwritten by NotASandboxLocalEnvironment — the documented "disable" behavior doesn't work. One-line fix: this._sandbox === undefined instead of !userProvidedSandbox.
  • Correctness: executeFetch discards response body on non-2xx errors while executeSandbox includes it — inconsistency that reduces model retry effectiveness.
  • Defensive coding: httpRequest's context?.agent?.sandbox getter will throw (not return undefined) once the overwrite bug is fixed. Needs try-catch.
  • Test coverage: New grep and glob file-editor commands have no test coverage.
  • Documentation: Dynamic import at line 486 bypasses lint rule without comment.

The sandbox: false overwrite is the only blocking item. The architecture, stream-process implementation, Docker lifecycle, signal handling, and test suite for the core sandbox are all production-ready.


const userProvidedSandbox = this._sandbox !== undefined && this._sandbox !== false

if (!userProvidedSandbox && typeof process !== 'undefined' && process.versions?.node) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: sandbox: false is silently overwritten — documented behavior broken

This bug has been flagged repeatedly but remains unfixed. When a user passes sandbox: false:

  1. userProvidedSandbox = false !== undefined && false !== false = true && false = false
  2. !userProvidedSandbox fires → overwrites this._sandbox with NotASandboxLocalEnvironment

The AgentConfig TSDoc says "Pass false to explicitly disable sandbox" but agent.sandbox will happily return a working NotASandboxLocalEnvironment.

Fix (1-line change):

if (this._sandbox === undefined && typeof process !== 'undefined' && process.versions?.node) {

This ensures:

  • undefined (no config) → gets default ✅
  • false (explicitly disabled) → stays false, getter throws ✅
  • Sandbox instance → used as-is ✅

await this._pluginRegistry.initialize(this)

if (userProvidedSandbox) {
const { SANDBOX_DEFAULT_TOOLS } = await import('../vended-tools/sandbox-default-tools.js')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Add an eslint-disable comment documenting the intentional architectural bypass

The no-restricted-imports lint rule prevents static imports of vended-tools from core SDK files. This dynamic import() bypasses the static analysis rule. An explicit comment makes the exception visible and prevents future maintainers from unknowingly replicating the pattern:

// eslint-disable-next-line -- Intentional: auto-register sandbox tools when user opts into sandbox
const { SANDBOX_DEFAULT_TOOLS } = await import('../vended-tools/sandbox-default-tools.js')

(The test file agent.model-retry.test.ts already uses // eslint-disable-next-line no-restricted-imports for its preload import, confirming the convention.)

@github-actions

Copy link
Copy Markdown
Contributor

Review Summary

Assessment: Request Changes

The sandbox architecture is excellent — streaming-first design, ShellSandbox template method pattern, opt-in auto-registration, and clean separation of concerns. After 14+ review iterations, the vast majority of issues have been addressed. One correctness bug remains blocking.

Review Details
  • Correctness (blocking): sandbox: false is still silently overwritten by NotASandboxLocalEnvironment in initialize() — the condition !userProvidedSandbox treats false the same as undefined. One-line fix: this._sandbox === undefined.
  • Breaking change: FileEditorInput, FileEditorOptions, and IFileReader type exports removed from @strands-agents/sdk/vended-tools/file-editor without deprecation. Should be documented or deprecated.
  • Test coverage: New grep and glob file-editor commands have zero test coverage despite executing shell commands with user-provided patterns.
  • Documentation: Dynamic import at line 486 bypasses lint rule without comment.

Note: Previous review comments about httpRequest dual-path (sandbox curl vs fetch) and executeFetch response body discarding are stale — the httpRequest tool is unchanged from main and has no sandbox integration in this PR. Those comments can be disregarded.

@agent-of-gsird

Copy link
Copy Markdown

Sixteenth Review Pass (HEAD: fe60acd)

Assessment: ⚠️ Request Changes (one blocking bug persists)

@gautamsirdeshmukh — Re-triggered to check status. My thorough independent review has been ongoing since your May 12 request. Here's the current state:


🔴 One blocking bug remains: sandbox: false is silently overwritten

This has been flagged in my Fourteenth and Fifteenth review passes (and independently confirmed by bot reviews on May 18). The fix is a 1-line change to agent.ts:

- if (!userProvidedSandbox && typeof process !== 'undefined' && process.versions?.node) {
+ if (this._sandbox === undefined && typeof process !== 'undefined' && process.versions?.node) {

Without this fix, a user passing sandbox: false (documented as "explicitly disable sandbox") silently gets a working NotASandboxLocalEnvironment instead — contradicting the TSDoc contract.


🟡 Minor items (non-blocking)

# Item Notes
1 httpRequest sandbox getter Will throw (not fall back to fetch) once the above bug is fixed. Needs try-catch wrapper around context?.agent?.sandbox.
2 executeFetch discards response body on non-2xx Inconsistent with sandbox path which includes body. Include .slice(0, 2000) of body in the error.
3 eslint-disable comment on dynamic import Minor documentation hygiene for the intentional architectural bypass.
4 New grep/glob file-editor commands No test coverage yet (minor).

✅ All other technical issues remain resolved

Every issue I raised across 15 review passes (process leak, signal codes, API naming, MITM, Alpine stat, MIME extension, error body, listFiles trim, SIGKILL escalation, auto-registration, etc.) has been properly addressed. The core architecture is excellent.

📋 Process items

# Item Status
1 needs-api-review label Not applied
2 Documentation PR Still "Not yet"

My Recommendation

Fix the 1-line sandbox: false condition, and this is ready to merge. The architecture is production-quality — clean class hierarchy, streaming-first design, opt-in auto-registration, proper signal handling, comprehensive tests.


🤖 AI agent response. Strands Agents. Feedback welcome!


const userProvidedSandbox = this._sandbox !== undefined && this._sandbox !== false

if (!userProvidedSandbox && typeof process !== 'undefined' && process.versions?.node) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: sandbox: false is still silently overwritten — documented behavior doesn't work

This has been flagged across multiple review passes and remains the only blocking issue. When a user passes sandbox: false:

const userProvidedSandbox = this._sandbox !== undefined && this._sandbox !== false
// ↑ false !== undefined (true) && false !== false (false) → false

if (!userProvidedSandbox && ...) {
  this._sandbox = new NotASandboxLocalEnvironment()  // overwrites false!
}

The AgentConfig TSDoc says "Pass false to explicitly disable sandbox" but it doesn't actually disable anything.

Fix (1-line change):

if (this._sandbox === undefined && typeof process !== 'undefined' && process.versions?.node) {

This ensures:

  • undefined (no config) → gets default ✅
  • false (explicitly disabled) → stays false, getter throws ✅
  • Sandbox instance → used as-is, tools auto-registered ✅

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional, not a bug. NotASandboxLocalEnvironment is not an actual sandbox, but rather a simple interface adapter that lets vended tools use fs/promises and child_process directly on the host, which is exactly what a non-sandboxed agent does today.

Setting it as the default ensures the vended tools work uniformly regardless of whether a real sandbox is configured. Without this default, every vended tool would need separate code paths for "sandbox available" vs "no sandbox".

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification — the design rationale makes sense. Having NotASandboxLocalEnvironment as a universal adapter so vended tools never need "sandbox available vs not" branching is a clean pattern.

The remaining issue is that the TSDoc is inaccurate for the false case:

* Pass `false` to explicitly disable sandbox and sandbox tool auto-registration.

Since false actually results in the same behavior as undefined (both get NotASandboxLocalEnvironment, both skip tool registration), the documented distinction is misleading. Suggestions:

  1. If false has no distinct purpose from undefined: remove it from the type and simplify to sandbox?: Sandbox
  2. If false exists for future-proofing (e.g., when a default sandbox with tools is planned): update the TSDoc to clarify what it actually controls today:
    * Pass `false` to explicitly opt out of sandbox tool auto-registration
    * (equivalent to omitting sandbox for now).

This is non-blocking — just a documentation accuracy concern.

@github-actions

Copy link
Copy Markdown
Contributor

Review Summary

Assessment: Request Changes (one blocking bug)

The sandbox implementation is production-ready — excellent streaming-first architecture, clean ShellSandbox template method pattern, proper opt-in auto-registration, and comprehensive test coverage. One semantic bug remains.

Details
  • Blocking: sandbox: false overwrite (line 462) — the condition !userProvidedSandbox treats false the same as undefined. One-line fix: this._sandbox === undefined.
  • All other previously-raised issues have been resolved: MITM (accept-new ✅), Alpine stat (wc -c fallback ✅), listFiles trim (✅), spawnSync comment (✅), shellQuote shared (✅), grep/glob tests (✅), file-editor types still exported (✅).

Note: Many earlier review comments (httpRequest dual-path, executeFetch body discard, file-editor type removal, grep/glob missing tests) reference code that has since been changed or was never modified in this PR. Those can be disregarded — only the sandbox: false condition needs fixing.

@github-actions

Copy link
Copy Markdown
Contributor

Review Summary

Assessment: Approve (with minor suggestions)

Excellent implementation. The streaming-first architecture, ShellSandbox template method pattern (implement one method, get everything else), opt-in tool auto-registration, and comprehensive test suite are all well-designed. After reading the author's clarification on sandbox: false being intentional (uniform interface adapter), I'm satisfied that no blocking issues remain.

Review Details
  • TSDoc accuracy (non-blocking): The docstring for sandbox?: Sandbox | false says "disable sandbox" but false behaves identically to undefined today. Consider clarifying or removing false from the type — replied in thread.
  • Dynamic import lint bypass (non-blocking): Line 487 await import('../vended-tools/sandbox-default-tools.js') bypasses the no-restricted-imports rule. Consider adding an eslint-disable comment documenting the intentional exception.
  • Process: needs-api-review label still not applied. This introduces a new abstract class, new Agent.sandbox property, and 3 vended tools — meets the bar for formal API review per API Bar Raising guidelines.

All previously-raised technical issues (MITM, Alpine stat, listFiles whitespace, process leaks, signal codes, auto-registration, etc.) have been correctly resolved. The implementation is production-ready.

@agent-of-gsird

Copy link
Copy Markdown

Seventeenth Review Pass (Final)

Assessment: ✅ Approve

@gautamsirdeshmukh — Thanks for the clarification on sandbox: false. I now understand the design intent:

NotASandboxLocalEnvironment is a uniform interface adapter so vended tools never need "sandbox available vs not" branching. false and undefined both result in this adapter being set, with the only distinction being tool auto-registration.

This is a clean pattern — the Null Object here isn't "disabled" but rather "passthrough to host", which is exactly what a non-sandboxed agent does today. My previous "Request Changes" assessments were based on reading the TSDoc literally ("Pass false to explicitly disable sandbox") rather than the actual architectural intent.

✅ Final Technical Assessment

All technical issues I raised across 16 review passes have been properly resolved. The implementation is production-ready:

  • Core architecture: Excellent class hierarchy, streaming-first design, ShellSandbox template method pattern
  • Security: SSH defaults to accept-new, proper signal handling, injection-safe shell quoting
  • Correctness: Process leak fixed, signal exit codes correct, SIGKILL escalation works, Alpine/BusyBox stat fallback with wc -c
  • Backward compatibility: Opt-in auto-registration only when user explicitly provides a Sandbox instance
  • Test coverage: Comprehensive adversarial, security, binary, concurrency, and timeout tests

📋 One documentation suggestion (non-blocking)

The TSDoc for sandbox?: Sandbox | false could be clearer about what false actually does vs undefined. Since they have the same runtime behavior today (both get NotASandboxLocalEnvironment, both skip tool registration), consider either:

  1. Removing false from the type (simplify to sandbox?: Sandbox)
  2. Updating TSDoc to clarify: "Pass false to explicitly opt out of sandbox tool auto-registration (equivalent to omitting for now; reserved for future default-sandbox behavior)"

This is purely a documentation clarity concern and doesn't block merge.

📋 Process items (non-code)

# Item Recommendation
1 needs-api-review label Recommend applying for process completeness
2 Documentation PR Link a tracking issue before or shortly after merge

Summary

No further technical objections. Approving.

This has been one of the most thoroughly reviewed PRs I've participated in (16+ passes, every file read multiple times). The quality of the implementation and the author's responsiveness to feedback are both excellent. Well done.


🤖 AI agent response. Strands Agents. Feedback welcome!

@github-actions

Copy link
Copy Markdown
Contributor

Review Summary

Assessment: Comment (non-blocking suggestions)

The sandbox architecture is production-ready — streaming-first design, clean ShellSandbox template method pattern, proper opt-in auto-registration, comprehensive test coverage across unit/integration, and good security hygiene with shellQuote() everywhere. Accepting the author's clarification that sandbox: false being treated as undefined is intentional for uniform tool access.

Remaining Suggestions
  • TSDoc clarity: The sandbox?: Sandbox | false docstring says "Pass false to explicitly disable sandbox and sandbox tool auto-registration" — but undefined (omitting it) does the exact same thing. Consider either removing false from the union type, or clarifying the docs to explain when false becomes distinct from undefined (e.g., future default sandbox behavior). This avoids confusion for users who read the type and think false provides different behavior.
  • Dead type exports: IFileReader is still exported from @strands-agents/sdk/vended-tools/file-editor but isn't used by the new sandbox-based implementation. If it existed before this PR, no action needed; if it was added here, consider removing unused types.
  • Process: This PR introduces a new abstract class (Sandbox), a new Agent property, 3 auto-registered tools, and a new sub-export path (./sandbox). Per API Bar Raising guidelines, this qualifies as a "moderate change" that should have API reviewer sign-off. Consider adding the needs-api-review label.

The implementation quality is high — clean class hierarchy, proper signal handling in streamProcess, Docker cleanup handlers, shell injection protection, and excellent test coverage including Docker integration tests.

@github-actions

Copy link
Copy Markdown
Contributor

Review Summary

Assessment: Approve

Fresh independent review at commit 459f152. After reading all source files, tests, and the extensive prior review history (17+ passes), the implementation is production-ready.

Verification Results
  • Architecture: Clean class hierarchy (SandboxShellSandboxRemote/Docker, NotASandboxLocalEnvironment as passthrough). Streaming-first design with convenience wrappers.
  • Security: shellQuote() on all user-provided paths/commands, LANGUAGE_PATTERN regex validates interpreter names, SSH defaults to accept-new.
  • Backward compatibility: Tools only auto-registered when user explicitly provides sandbox in config. userProvidedSandbox captured before default assignment.
  • Test coverage: Comprehensive — ShellSandbox unit tests (binary, concurrency, timeout, AbortSignal), RemoteSandbox mocked SSH args, DockerSandbox integration, all 3 vended tools, SandboxStorage, and skills sandbox loading.
  • sandbox: false semantics: Confirmed intentional per author — uniform interface adapter so vended tools never need branching. TSDoc could be clearer but non-blocking.
  • Prior review items: All previously-flagged critical bugs (MITM, Alpine stat, listFiles whitespace, process leaks, signal codes, auto-registration regression) have been correctly resolved.
Non-blocking suggestions
  • Process: needs-api-review label still not applied. This introduces a new abstract class (Sandbox), new Agent.sandbox property, 3 auto-registered tools, and a new sub-export path — meets the "moderate change" bar per API Bar Raising guidelines.
  • TSDoc: sandbox?: Sandbox | false — consider clarifying that false is currently equivalent to omitting (reserved for future default-sandbox behavior).
  • Documentation: PR says "Not yet" — recommend linking a tracking issue in strands-agents/docs before merge.

No new technical concerns beyond what prior reviews have already raised and resolved. The implementation quality is high throughout.

@agent-of-gsird

Copy link
Copy Markdown

Status: No New Action Required

@gautamsirdeshmukh — I was re-triggered to check this PR. My thorough independent review (17 passes, every file read) is complete and my final assessment is ✅ Approve (as of my Seventeenth Review Pass on May 18).

Current state:

  • All technical issues resolved ✅
  • Implementation is production-ready ✅
  • sandbox: false design intent clarified and accepted ✅

Non-blocking suggestions that remain:

  • TSDoc for sandbox?: Sandbox | false could clarify that false is currently equivalent to omitting
  • needs-api-review label for process completeness
  • Documentation PR/tracking issue before merge

No new commits or comments since my last review. Nothing further needed from me unless there are new changes to evaluate. 🚀


🤖 AI agent response. Strands Agents. Feedback welcome!

pattern: string,
maxResults: number | undefined
): Promise<string> {
const result = await sandbox.execute(`find ${shellQuote(dirPath)} -name ${shellQuote(pattern)}`)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: glob command uses find -name which doesn't support multi-segment glob patterns

The glob command description says it accepts glob patterns like **/*.ts, but the underlying implementation uses find -name which only matches against the filename (basename), not the full path. A pattern like src/**/*.ts would not behave as expected — find -name ignores the path prefix entirely.

Suggestion: Either:

  1. Document that pattern is a filename-only glob (not a full path glob), or
  2. Switch to a more capable approach for path-based patterns:
const result = await sandbox.execute(
  `find ${shellQuote(dirPath)} -path ${shellQuote(pattern)}`
)

find -path matches against the full path from the starting point, supporting patterns like **/test/*.ts.

@github-actions

Copy link
Copy Markdown
Contributor

Review Summary

Assessment: Approve

Fresh independent review at ecb6936. After reading all sandbox source files, tests, agent integration, plugin changes, and the 17+ prior review passes, the implementation is production-ready.

Verification Results

Architecture

  • Clean class hierarchy: SandboxShellSandboxRemote/Docker, NotASandboxLocalEnvironment as passthrough
  • Streaming-first design with executeStreaming as the abstract primitive
  • ShellSandbox template method pattern: subclasses implement one method, get everything else free
  • shellQuote extracted to shared utility (src/utils/shell-quote.ts)

Security

  • shellQuote() protects against injection in all user-facing paths/commands
  • LANGUAGE_PATTERN regex validates interpreter names
  • SSH defaults to accept-new with allowUnknownHosts opt-in flag + security warning
  • stream-process.ts SIGTERM→SIGKILL escalation after 1s prevents hung processes

Backward Compatibility

  • Tools only auto-registered when userProvidedSandbox is true (captured before default assignment)
  • agent.test.ts toolSpecs assertions remain [] for agents without sandbox config
  • LocalAgent interface has @internal brand symbol preventing external implementations

Process Management

  • No 50ms polling — uses proper wake() / Promise-based signaling
  • All event listeners attached synchronously before first await (no race on fast processes)
  • AbortSignal support for cooperative cancellation
  • Docker cleanup: spawnSync in exit handler (correct — async I/O unavailable during exit)

Test Coverage

  • Shell sandbox: binary round-trip, concurrency, timeout, AbortSignal, injection prevention
  • Remote sandbox: SSH arg construction verified via mocked spawn
  • All 3 vended tools: exec, codeInterpreter, fileEditor (including grep, glob, undo)
  • SandboxStorage: text, JSON, binary content round-trips
  • Skills: Skill.fromSandbox directory traversal and loading
New Findings (2 suggestions, non-blocking)
  1. glob command uses find -name — doesn't support multi-segment path patterns like src/**/*.ts. The schema description implies full glob support. Consider find -path or document the limitation.
  2. str_replace tab expansion removed — existing behavior change for the fileEditor tool (tabs are no longer expanded to 8 spaces before comparison). Likely an improvement but should be documented as a known change.
Process Items (non-blocking)
  • needs-api-review label: Not applied. This introduces a new abstract class (Sandbox), new Agent.sandbox property, 3 auto-registered tools, and a new sub-export path — meets the "moderate change" bar per API Bar Raising guidelines.
  • Documentation PR: Still "Not yet" — recommend linking a tracking issue before merge.
  • PR includes unrelated changes (_estimateInputTokens, normalizeToolUseNames, structured output regression fixes, MCP toolsChanged tests) that could have been separate PRs for reviewability.

The implementation quality is high — clean design, comprehensive tests, and all prior critical issues have been correctly resolved.

@agent-of-gsird

Copy link
Copy Markdown

Status Update (commit ecb6936)

@gautamsirdeshmukh — Re-triggered to check the latest push. I've verified the new commit against my comprehensive review history (17 independent passes, every file read).

✅ Assessment: Approve (unchanged)

The latest force-push (ecb6936) appears to be a rebase with no functional changes to the sandbox implementation since my final approval on May 18. All technical issues I raised across 17 review passes remain properly resolved:

  • Process leak fix (setTimeout + wake()) ✅
  • Signal exit codes (128 + signal) ✅
  • SIGTERM→SIGKILL escalation with done flag ✅
  • API naming alignment with Python SDK ✅
  • RemoteSandbox SSH defaults to accept-new
  • Alpine/BusyBox statFile with wc -c fallback ✅
  • listFiles uses .replace(/\r$/, '') not .trim()
  • Opt-in auto-registration (only when user provides sandbox) ✅
  • agent.sandbox getter with runtime guard ✅
  • Docker async lifecycle + cleanup handlers ✅
  • SandboxStorage MIME via toMediaFormat()
  • shellQuote shared utility ✅

📋 Non-blocking suggestions (unchanged)

  1. TSDoc for sandbox?: Sandbox | false — clarify that false is currently equivalent to omitting (reserved for future default-sandbox behavior)
  2. needs-api-review label — recommend for process completeness
  3. Documentation PR — link a tracking issue before/after merge
  4. glob command — uses find -name which doesn't support multi-segment path patterns (per latest bot review); consider find -path or document the limitation

Summary

No further technical objections. My approval stands. The implementation is production-ready — excellent architecture, comprehensive tests, and all critical bugs fixed correctly across the review cycle.


🤖 AI agent response. Strands Agents. Feedback welcome!

@mkmeral mkmeral left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't dove too much into tools or plugin updates yet. It might be a good idea to separate them, so we can review/iterate independently as I'd expect some iteration especially on tools

export abstract class Sandbox {
// ---- Lifecycle methods ----

async start(): Promise<void> {}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have lifecycle methods? do we need them?

}

if (this._sandbox) {
await this._sandbox.start()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the agent "starting" the sandbox? should the agent own sandboxes' lifecycle? I wouldn't want that as sandbox can live outside the agent too

* Sandbox for tool code execution and filesystem access.
* Set immediately if passed via config, otherwise defaults to NotASandboxLocalEnvironment during initialize().
*/
private _sandbox: Sandbox | false | undefined

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we allow undefined?


const userProvidedSandbox = this._sandbox !== undefined && this._sandbox !== false

if (!userProvidedSandbox && typeof process !== 'undefined' && process.versions?.node) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, what happens if I don't provide a sandbox, and I am in browser? what's the devx?


await this._pluginRegistry.initialize(this)

if (userProvidedSandbox) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do want this behavior, but it also makes me think if DevX is good enough 🤔

}

async readFile(path: string): Promise<Uint8Array> {
const result = await this.execute(`base64 < ${shellQuote(path)}`)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implicitly makes the assumption that sanboxed environment is unix, right? can we improve this? or is this okay (and maybe we need to rename the class)?

* @returns FileInfo with name, isDir, and size.
* @throws Error if the path does not exist.
*/
abstract statFile(path: string): Promise<FileInfo>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we just use list files to get this info? do we need another abstraction? why?

* using the '\'' pattern. Single quotes disable all shell expansion
* (variables, backticks, globbing), making this safe against injection.
*/
export function shellQuote(value: string): string {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this a separate util? and why are we exporting it? do we export to customers?

export const exec = tool({
name: 'exec',
description:
'Execute a shell command in the sandbox. ' +

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gautamsirdeshmukh let's separate plugins and tools to another PR. We should iterate over the tools themselves. Like for this specific tool

  1. why can't we reuse bash tool?
  2. from LLM perspective: "what sandbox?" did you tell me a sandbox exists? what does it mean?
  3. What are the limitations, usage examples, etc. Check our tools repo for how descriptions are defined

}

/**
* Serializable snapshot of sandbox state returned by {@link Sandbox.pause}.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait what? how? why?

@agent-of-gsird

Copy link
Copy Markdown

Update: @mkmeral's Review Aligns with Prior Concerns

@gautamsirdeshmukh — I see @mkmeral has now left a substantive review (the first human maintainer review besides yourself). Several of their questions directly align with issues raised in earlier passes, but also raise new architectural concerns:

Alignment between my reviews and @mkmeral's feedback

@mkmeral's question My prior coverage
"Why do we have lifecycle methods?" (base.ts:39) agent-of-mkmeral's original bot review flagged this as 🔴 Critical ("Remove lifecycle from base Sandbox class"). I accepted the author's resolution after the methods became opt-in, but @mkmeral's human review reaffirms the concern.
"Why is the agent starting the sandbox?" (agent.ts:469) Consistent with the lifecycle ownership concern. The sandbox should be pre-started externally.
"We should make dynamic sentinels" (shell.ts:38) New security concern I missed — the static STRANDS_EOF sentinel is predictable. If an attacker controls process output, they can inject this string to break shell parsing. Dynamic/random sentinels would be more secure. Good catch.
"Is Remote a good name? SSH matters" (remote.ts:59) I focused on the security aspects (MITM). Renaming to SshSandbox would be clearer for extensibility (future non-SSH remote sandboxes like AgentCore).
"Let's separate plugins and tools to another PR" (exec.ts:40) The bot reviews flagged scope from the start ("PR scope is very large"). I focused on technical correctness of what was present rather than scope.
"Can't we just use listFiles for statFile?" (base.ts:118) I noted listFiles/statFile size inconsistency but accepted statFile as useful for single-file checks without directory listing overhead.
"Why is shellQuote exported?" (shell-quote.ts:8) I suggested extracting shellQuote to shared utility in multiple passes. The author did this, but @mkmeral correctly questions customer-facing API surface.
"What sandbox?" (exec.ts tool description) Valid LLM ergonomics question — tool descriptions should be written from the model's perspective.

My recommendation given @mkmeral's feedback

@mkmeral's review raises architectural design questions that go beyond the implementation correctness I focused on:

  1. PR scope: Splitting tools/plugins into a separate PR would enable independent iteration (as @mkmeral suggests)
  2. Dynamic sentinels: Legitimate security concern — random sentinel per invocation prevents injection
  3. Tool design for LLMs: Descriptions need to be model-friendly without referencing "sandbox" abstractly
  4. API surface: shellQuote and statFile exports should be reconsidered
  5. Lifecycle ownership: Agent shouldn't own sandbox start/stop

My technical approval of the sandbox core implementation still stands — the code is correct, well-tested, and handles edge cases properly. But @mkmeral's review questions the design and scope, which requires author/maintainer resolution.

I'll defer to @mkmeral's architectural decisions. My role here was the thorough technical review you requested, which is complete. The ball is now between you and @mkmeral on scope/design direction.


🤖 AI agent response. Strands Agents. Feedback welcome!

@gautamsirdeshmukh

Copy link
Copy Markdown
Contributor Author

NOTE ----> separating into multiple PRS, (1) interface (2) sandbox implementations (3) vended tools/plugins

@agent-of-gsird

Copy link
Copy Markdown

Acknowledged: PR Split Plan

@gautamsirdeshmukh — Sounds good. Splitting into (1) interface, (2) sandbox implementations, (3) vended tools/plugins is the right move given @mkmeral's architectural feedback on lifecycle ownership, naming, and tool design for LLMs.

My thorough technical review of the core implementation is complete and stands. When the split PRs are opened, I'm ready to review each independently. The focused scope will make it easier to iterate on each piece — especially the tools, where @mkmeral raised good questions about LLM-facing descriptions and reuse of existing tools.

Let me know when the first PR (interface) is ready for review. 🚀


🤖 AI agent response. Strands Agents. Feedback welcome!

async retrieve(reference: string): Promise<{ content: Uint8Array; contentType: string }> {
const path = await import('node:path')
const filePath = path.resolve(reference)
const resolvedDir = path.resolve(this._basePath)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: SandboxStorage.retrieve path traversal validation resolves paths against the host CWD, not the sandbox

The path traversal guard uses path.resolve(reference) and path.resolve(this._basePath) which resolve relative to the host process's CWD, not the sandbox's workingDir. In a Docker sandbox where workingDir = '/workspace', the validation logic resolves against the host's CWD (e.g., /home/user/project).

This works today because store() always returns relative paths (e.g., ./artifacts/12345_1_key.txt) and retrieve() receives that same relative string — both path.resolve() calls use the same base, so the startsWith check passes. But it's fragile: if someone passes an absolute sandbox path like /workspace/artifacts/file.txt, the guard would incorrectly reject it.

Suggestion: Consider either removing the path.resolve check (since the sandbox already provides filesystem isolation) or using string-prefix matching on the raw relative path:

if (!reference.startsWith(this._basePath)) {
  throw new Error(`Reference not found: ${reference}`)
}

}

async *executeStreaming(
command: string,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: Auto-starting in executeStreaming is a hidden side effect

executeStreaming is documented as a pure execution method, but here it silently calls await this.start() which creates a Docker container, registers cleanup handlers, and changes _running state. This makes the method's behavior non-obvious and can cause surprising failures (e.g., if docker info fails during what appears to be a simple command execution).

This is also inconsistent with pause() which throws if !this._running — one method auto-starts, the other requires explicit start.

Suggestion: Either:

  1. Remove auto-start and throw if !this._running (consistent with pause()), or
  2. Document the auto-start behavior in the method's TSDoc

This is a minor point given @mkmeral's broader feedback about lifecycle ownership. Noting for the split PRs.

@github-actions

Copy link
Copy Markdown
Contributor

Review Summary

Assessment: Approve (with suggestions for split PRs)

The implementation is production-ready. After reading the full source at 0db7232 and the extensive prior review history (17+ passes), all previously-flagged critical issues have been resolved correctly.

What I verified ✅
  • stream-process.ts: Proper wake/Promise signaling (no polling), synchronous listener attachment (no race), SIGTERM→SIGKILL escalation, AbortSignal support, ENOENT handling
  • shell.ts: Dynamic sentinels (crypto.randomUUID()), base64 binary-safe file transfer, ls -1aF parsing with line-ending stripping (not trim()), Alpine wc -c fallback with NaN guard
  • docker.ts: Async dockerCmd via execFile, spawnSync in exit handler (correct — with comment explaining why), dangerous mount validation, signal re-propagation
  • remote.ts: accept-new default (MITM fixed), blocked SSH options, proper -o flag construction
  • agent.ts: userProvidedSandbox captured before default assignment gates auto-registration correctly, sandbox getter with runtime guard
  • file-editor.ts: Binary detection (raw.includes(0)), handleGlob smart -path/-name heuristic, undo via WeakMap, all operations route through sandbox
  • Backward compatibility: No tools added without explicit sandbox config, LocalAgent has @internal brand
New findings (2 minor suggestions)
  1. SandboxStorage.retrieve uses host-CWD path.resolve() for traversal validation — works but fragile for absolute sandbox paths
  2. DockerSandbox.executeStreaming auto-starts if !this._running — hidden side effect inconsistent with pause() which throws

Both are minor and appropriate to address in the split PRs.

Process alignment
  • Splitting into multiple PRs (per @mkmeral's feedback) is the right approach — interface, implementations, and tools/plugins can iterate independently
  • needs-api-review label should be applied to the interface PR (feat: base Sandbox interface #1090) at minimum
  • @mkmeral's architectural questions (lifecycle ownership, dynamic sentinels ✅ already done, naming Remote vs SSH) should be resolved in the split PRs

The glob heuristic (-path when / present, -name otherwise) is a good improvement since the earlier bot review flagged this. The dynamic sentinel (STRANDS_EOF_${crypto.randomUUID()}) addresses @mkmeral's injection concern. Overall quality is high.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants